Pronunciation modeling for large vocabulary conversational speech recognition

نویسندگان

  • Kristine W. Ma
  • George Zavaliagkos
  • Rukmini Iyer
چکیده

In this paper, we address the issue of deriving and using more realistic pronunciations to represent words spoken in natural conversational speech. Previous approaches include using automatic phoneme-based rule-learning techniques [1, 2, 7], linguistic transformation rules [4, 8], and phonetically hand-labelled corpus [3] to expand the number of pronunciation variants per word. While rule-based approaches have the advantage of being easily extensible to infrequent or unobserved words, they su er from the problem of over generalization. Using hand-transcribed data, one can obtain a more concise set of new pronunciations but it cannot be extended to unobserved or infrequently occuring words. In this paper, we adopt the hand-labelled corpus scheme to improve pronunciations for frequent multi and single words occurring in the training data, while using the rule-based techniques to learn pronunciation variants and their weights for the infrequent words. Furthermore, we experiment with a new approach for speaker-dependent pronunciation modeling. The newly expanded dictionaries are evaluated on the Switchboard and Callhome corpora, giving a slight reduction in word recognition error rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition

Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...

متن کامل

Pronunciation Modeling for Large Vocabulary Speech Recognition by Arthur

The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy for automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Others model the pronunciati...

متن کامل

Rate-of-speech Modeling for Large Vocabulary Conversational Speech Recognition

Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...

متن کامل

Rate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition

Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...

متن کامل

Modeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition

The main problems of a conversational Russian speech recognition system development are variability of pronunciation, free word-order in sentences and presence of speech disfluencies. In the paper, pronunciation variability is modeled by creation of multiple word transcriptions. A syntacticstatistical language model that takes into account long-distant word dependencies is proposed for Russian ...

متن کامل

Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition

In spontaneous conversational speech there is a large amount of variability due to accents, speaking styles and speaking rates (also known as the speaking mode) [3]. Because current recognition systems usually use only a relatively small number of pronunciation variants for the words in their dictionaries, the amount of variability that can be modeled is limited. Increasing the number of varian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998